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Abstract 

We consider two-person zero-sum stochastic mean payoff games with perfect information, 
or BWR-games, given by a digraph G = {V,E), with local rewards r : E ^ X, and three 
types of positions: black Vb, white Vw, and random Vr forming a partition of V. It is a long¬ 
standing open question whether a polynomial time algorithm for BWR-games exists, or not, 
even when IVhI = 0. In fact, a pseudo-polynomial algorithm for BWR-games would already 
imply their polynomial solvability. In this paper, we show that BWR-games with a constant 
number of random positions can be solved in pseudo-polynomial time. More precisely, in any 
BWR-game with |I/h| = 0(1), a saddle point in uniformly optimal pure stationary strategies 
can be found in time polynomial in \Vw\ + |Vb|, the maximum absolute local reward, and the 
common denominator of the transition probabilities. 


1 Introduction 

1.1 Basic concepts 

We consider two-person zero-sum stochastic games with perfect information and mean payoff: Let 
G = (V E) be a digraph whose vertex-set V is partitioned into three subsets V = VbG Vw U Vr that 
correspond to black, white, and random positions, controlled respectively, by two players, Black 
- the minimizer and White - the maximizer, and by nature. We also fix a local reward function 
r : E ^ Z, and probabilities p{v,u) for all arcs (v,u) going out of u G Vr. Vertices v G V and arcs 
e G E are called positions and moves, respectively. The game begins at time t = 0 in the initial 
position So = vq. In a general step, in time t, we are at position st G V. The player who controls 
St chooses an outgoing arc et+i = {st,v) G E, and the game moves to position st+i = v. If St G Vr 
then an outgoing arc is chosen with the given probability p{st, St+i). We assume, in fact without 
any loss of generality, that every vertex in G has an outgoing arc. (Indeed, if not, one can add 
loops to terminal vertices.) In general, the strategy of the player is a policy by which (s)he chooses 
the outgoing arcs from the vertices (s)he controls. This policy may involve the knowledge of the 
previous steps as well as probabilistic decisions. We call a strategy stationary if it does not depend 
on the history and pure if it does not involve probabilistic decisions. For this type of games, it will be 
enough to consider only such strategies, since these games are known to be (polynomially) equivalent 
[BEGMl^a] to the perfect information stochastic games considered by Gillette [Gil571 ILL69j . 

In the course of this game players generate an infinite sequence of edges p = (ei, 62 ,...) (a play) 
and the corresponding integer sequence r(p) = (r(ei), r(e 2 ),...) of local rewards. At the end (after 
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infinitely many steps) Black pays White (j){r{p)) amount. Naturally, White’s aim is to create a 
play which maximizes (f>{r{p))^ while Black tries to minimize it. (Let us note that the local reward 
function r : E ^ X may have negative values, and ())(r(p)) may also be negative, in which case 
White has to pay Black. Let us also note that r(p) is a random variable since random transitions 
occur at positions in Vr.) Here (j) stands for the limiting mean payoff 

<(.(r(p))=liminf^^^M, (1) 

T—>-oo i + i 

where E[r(ei)] is the expected reward incurred at step i of the play. 

As usual, a pair of (not necessarily pure or stationary) strategies is a saddle point if neither of 
the players can improve individually by changing her/his strategy. The corresponding (/)(r(p)) is the 
value of the game with respect to initial position vq. Such a pair of strategies are called optimal] 
furthermore, it is called uniformly optimal if it provides the value of the game for any initial position. 

This class of BWR-games was introduced in |GKK88) : see also |CH08) . The special case when 
Vr = 0, BW-games, is also known as cyclic games. They were introduced for the complete bipartite 
digraphs in [Mon76] , for all (not necessarily complete) bipartite digraphs in [EM79) , and for arbitrary 
digraphs^ in [GKK88] . A more special case was considered extensively in the literature under the 
name of parity games |BV01a[ IBVOlbl IGJH04) , |Hal07[ IJur981 IJPZ06] , and later generalized also to 
include random positions in |GH08) . A BWR-game is reduced to a minimum mean cycle problem 
in case Vw = Vr — see, e.g., |Kar78) . If one of the sets Vr or Vw is empty, we obtain a Markov 
decision process (MDP), which can be expressed as a linear program; see, e.g., |MU70) . Finally, if 
both are empty Vr = Vw = 0, we get a weighted Markov chain. In the special case when all rewards 
are zero except at special positions called terminals, each of which only has a single outgoing arc 
forming a self-loop, we get a stochastic terminal payoff game, and when the self-loops have 0/1 
payoffs, and every random position has only two outgoing arcs with probability 1/2 each, we obtain 
the so-called simple stochastic games (SSGs), introduced by Gondon [Gon921 [GonQS] and considered 
in several papers [GH081 IHal07) . In the latter games, the objective of White is to maximize the 
probability of reaching the terminal, while Black wants to minimize this probability. Recently, it 
was shown that Gillette games with perfect information (and hence BWR-games by [BEGM13a] l 
are equivalent to SSGs under polynomial-time reductions |AM09) . Thus, by results of Halman 
|Ha]n7] . all these games can be solved in randomized strongly subexponential time 
where Ud = |Pb| + I Hr I is the number of deterministic vertices. On the other hand, if the number of 
random positions is constant, there are polynomial time algorithms for SSGs. Gimbert and Horn gave 
an 0(|yR|!|H||E| -|- |p|) algorithm, where \p\ is the maximum bit-length of a transition probability. 
Ghatterjee et al. |CdAH09| pointed out that a variant of strategy iteration can be implemented 
to solve SSGs in time 4 l'AI|y^| 0 (i)|y| 0 (i)^ Qg gave a randomized algorithm with 

expected running time Ibsen-Jensen and Miltersen |IJM12) improved these bounds 

by showing that a variant of value iteration solves SSGs in time in 0 (|I/r| 2 I'AI (| VrI log | Vr| -I- |H|)). 
For BW-games several pseudo-polynomial and subexponential algorithms are known |GKK88[IKL931 
[ZPM [Pi^ IB vni al [RWlbl IHBVn41 iBVn^IRVn71 IHa]n71 IWH)^ : see also [.IPZnB] for parity games. 
Besides their many applications (see e.g. [Lit96l[Jur00) l. all these games are of interest to Complexity 
Theory: Karzanov and Lebedev |KL93] (see also [ZP96) 1 proved that the decision problem “whether 
the value of a BW-game is positive” is in the intersection of NP and co-NP. Yet, no polynomial 
algorithm is known for these games, see e.g., the survey by Vorobyov [Vor08] . A similar complexity 
claim can be shown to hold for SSGs and BWR-games, see |AM091 IBEGMlSa] . 

1.2 Main result 

The problem of developing an efficient algorithm for stochastic games with perfect information was 
mentioned as an open problem in the survey |RF91) . While there are numerous pseudo-polynomial 
algorithms known for the BW-case, it is a challenging open question whether a pseudo-polynomial 

^In fact, BW-games on arbitrary digraphs can be polynomially reduced to BW-games on bipartite graphs 
IBEGM13al : moreover, the latter class can further be reduced to BW-games on complete bipartite graphs ICHKNfU . 
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algorithm exists for BWR-games, as the existence of such an algorithm would imply also the poly¬ 
nomial solvability of this class of games [AM09) . Our main result can be viewed as a partial solution 
of this problem, for the case when the number of random positions is fixed. 

For a BWR-game G let us denote by n = \ Vw\ + I^bI + tbe number of positions, by fc = IVrI 

the number of random positions, and assume that all local rewards are integral with maximum 
absolute value R, and all transition probabilities are rational with common denominator D. The 
main result of this paper is as follows. 

Theorem 1 A BWR-game G can be solved in {nDk)^^^^R ■ polylog i? time. 


As we shall see from our proof in Section |4l one can replace in the above theorem by 

poly(n)z/(C/)‘^^^^, where v{G) is defined by a parametrized BW-game obtained from G (see the precise 
definition in Section HIT) . For stochastic terminal payoff games with t terminals, it can be shown 
that v{G) < t -I- 1. Thus, Theorem [T] extends the fixed-parameter tractability of simple stochastic 
games with respect to the number of random positions [CdAH09l ?. IGH081 II.TM12] and the pseudo¬ 
polynomial solvability of deterministic mean payoff games [GKK88I IPis991 IZP96) . 

It is important to note that the above result requires a new technique to solve BWR-games. Ap¬ 
proximating by discounted games cannot give the claimed complexity. The example in [BEGM13b] 
shows that ones needs to choose /3 exponentially (in n) close to 1 even if we have only a single 
random position. 

Theorem [T] combined with the reduction in BEF~*~ll] implies that we can obtain an e-saddle 
point (that is, a pair of stationary strategies that approximate the value within an error of e) for a 
BWR-game in time poly(n,II^, M. 


1.3 Main ideas of the proof 

Our approach for proving Theorem[T]relies heavily on reducing the computation of uniformly optimal 
strategies for a general BWR-game to the case of ergodic BWR-games, i.e., those in which the 
(optimal) value does not depend on initial position, and showing that this special case can be solved 
in • poly(n, log i?) time. 

Our algorithm for the ergodic case is based on potential transformations which change the local 
reward without changing the normal form of the game; see Section 12.31 We would like to bring 
the local rewards to such a form that every locally optimal move is also globally optimal. Starting 
from zero potentials, the algorithm keeps selecting a subset of positions and reducing their potentials 
until either the locally optimal rewards (maximum for White, minimum for Black, and average for 
Random) at different positions become sufficiently close to each other, or a proof of non-ergodicity is 
obtained in the form of a certain partition of the positions. In more details, the algorithm proceeds 
in phases. In one phase, we divide the range of current locally optimal rewards into four regions, 
and keep reducing the potentials of some of the positions such that no position can escape from 
the middle two regions. The phase ends when either the top or bottom region becomes empty, or 
a proof of non-ergodicity is found. Note that an algorithm for BW-games, also based on potential 
reductions, was suggested in [GKK88) . However, as mentioned in [GKK88) . this algorithm cannot 
be extended to BWR-games since in this algorithm the middle regions consist of exactly one level. 
Thus random positions with arcs going out of the middle level may have to escape from that level 
after the potential reduction. In our algorithm, we overcome this difficulty by relaxing the middle 
level to a set of levels. 

The upper bound on the running time consists of three technical parts. The first one is to 
show that if the number of iterations becomes too large, then there is a large enough potential gap 
to ensure non-ergodicity. In the second part, we show that the range of potentials can be kept 
sufficiently small throughout the algorithm, namely ||a:*||oo < nRk{2D)^, and hence the range of 
the transformed rewards does not explode. The third part concerns the required accuracy. We show 
that it is enough to use an accuracy of 

e<{ik + lfn\2Df'^+^)-^ ( 2 ) 

in order to guarantee that the algorithm either finds the exact value or discovers non-ergodicity. As 
we shall see below, this accuracy is also enough to find an optimal strategy in the non-ergodic case. 
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This contrasts with the fact that there is an example with fc = 1, in which the difference between 
two distinct values is exponentially small in n; see [BEGMlSbj . We also show a lower bound of 
jjO.{k) running time of the algorithm of Theorem [T] by constructing a series of games, with 

only random positions (that is, weighted Markov chains). 

For a non-ergodic game, the above algorithm can be used to find the classes of positions with the 
largest and smallest values, but cannot find other classes, since they interact via random positions. 
Note that a stochastic terminal game with k random positions can be reduced, by guessing the 
order of values of the random positions, to k\ deterministic reachability games, each of which can be 
solved in polynomial time |GH08) . In contrast, in the case of BWR-games, even if we know the actual 
values (not only the order) of all the random positions, it is not clear how to find optimal strategies 
realizing these values. Nevertheless, we show that if the values are known, then the problem can 
be reduced to the ergodic case. Then to utilize the algorithm for the ergodic case, we employ the 
idea of solving parametrized games to reduce the search space for all values into a set of fc-tuples 
of rational intervals of cardinality at most . Using such a set of tuples we can iteratively 

replace random positions by self-loops on which the local reward is a guessed value. In more details: 

1. We iterate steps 2, 3 and 4 below over the random positions in the guessed order, keeping only 
the positions with highest rank (and hence having the same optimal value), and deleting all 
the other random positions. We iterate until no random positions remain, in which case we 
solve a BW-game. 

2. We consider the situation when all the kept random positions are replaced by a self-loop with 
local reward parameter x] we show that the value of any position in the resulting game defines 
an interval on the real line, as x changes from — oo to -l-oo. We identify a set of at most 
k'iG) + 1 maximal intervals, in each of which, the values of different positions as functions of 
X are either constant or equal to x in the entire interval. 

3. Since we do not know the real value of x, we guess among the identified intervals one that 
contains x; for the guessed interval, we provide optimal strategies for the positions that have 
values above the lower bound of the interval, assuming our guess is correct. 

4. Each of our guesses above yields a pair of strategies that can be verified for optimality by 
solving two MDPs. 

Note that the number of guesses is bounded by 

Remark 1 It is interesting to note that in the BW-model the ergodic case is as difficult as the 
general one \GKK88f while in the BWR-model the non-ergodic case seems more difficult, see Section 

0 

2 Preliminaries 

2.1 Markov chains 

Let (G = (U, E), P) be a Markov chain, and let Gi,..., Gfc C U be the vertex sets of the strongly 
connected components (classes) of G. For i ^ j, let us (standardly) write Ci -< Cj if there is an arc 
{v, u) £ E such that v £ Ct and u £ Cj. The components Ci, such that there is no Cj with Ci -< Cj 
are called the absorbing (or recurrent) classes, while the other components are called transient or 
non-recurrent. Let J = {z | Gi is absorbing}, A = UigjGi, and T = U \ yl. For X,Y C U, a matrix 
H C a vector h C we denote by H[X-,Y] the submatrix of H induced by X as rows 

and Y as columns, and by h[X] the subvector of h induced by X. Let / = /[U; U] be the \V\ x \V\ 
identity matrix, e = e[U] be the vector of all ones of dimension \V\. For simplicity, we drop the 
indices of /[•, •] and e[-], when they are understood from the context. Then P[Ci; Cj] = 0 if Cj -< Ci, 
and hence in particular, P[Gi;Gi]e = e for all i £ J, while P[T,T]e has at least one component of 
value strictly less than 1. 

The following are well-known facts about P* and the limiting distribution = e^P*, when the 
initial distribution is the wth unit vector e^ of dimension |U| (see, e.g., |KS63] 1: 
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(LI) Pw[A] > 0 and Pw[r] = 0; 

(L2) Yiuv,^^P^[V-T]=Q- 

(L3) rank(/ - P[Cp,Ci]) = \C, \ - I for all i G J, rank(/ - P[T;T]) = \T\, and (/ - P[T;T])-~^ = 

EZ,P[T;T]- 

(L4) the absorption probabilities yi G [0,1]^ into a class Ci, i G J, are given by the unique solution 
of the linear system: {I — P[T;T])yi[T] = P[T;Ci]e, yi[Ci] = e and yi[Cj] = 0 for j G J with 

j ^ *; 

(L5) the limiting distribution p^ G [0,1]^ is given by the unique solution of the linear system: 
Pw[Ci]iI - P[Ci;Ci]) = 0, Pw[Ci]e = yiiw), for all i G J, and Pw[T] = 0. 

2.2 BWR-games, solvability and ergodicity 

A BWR-game G — {G,p,r) is given by a digraph G = {V,E), where V = Vw U Vb U Vr is a partition 
of the vertices; G may have loops and multiple arcs, but no terminal positions, i.e., positions of 
out-degree 0; p{v,u) are probabilities for v G Vr, {v,u) G E satisfying | w) = 1 for 

all V G Vr; and r : A —>■ Z is a local reward function. For convenience we will also assume that 
p{v, m) > 0 whenever (n, u) G E and v G Vr, and set p(v, u) = 0 for (v, u) ^ E. 

Standardly, we define a strategy sw G Sw (respectively, sr G Sr) as a mapping that assigns 
a position u G V, such that {v,u) G E, to each position v G Vw (respectively, v G Vr). A pair 
of strategies s = {sw,sr) is called a situation. Given a BWR-game G = {G,p,r) and situation 
s = ( sb , svv ), we obtain a weighted Markov chain Gs = (Psp) with transition matrix Ps in the 
obvious way: 


{ 1 if (t> G Vw and u = sw(r’)) or (v G Vr and u = sr(v))] 

0 if (t> G Vw and u ^ sw(r’)) or {v G Vr and u ^ sr{v))-, 

p(v,u) VvGVr. 

In the obtained Markov chain Gs = iPs,f), we define the limiting (mean) effective payoff = 

Psiv) as 

t^s(,v) = ^ p*{v,w)^psiw,u)r{w,u), (3) 

wev u 

where p*{v,w) is the limit probability in Gs to be at position w when the initial position is v 
(see Section o for more details). It is known [Gil57[ ILL69) that every such game has a pair of 
(uniformly optimal) pure stationary strategies (s^,s|j) such that for any other pair of stationary 
strategies (svvj sr) and for every initial position n, the following hold: 

t^{.sw,si)iv) < Hsl^,si){v) < t^isl^,SB)i'v). 

The quantity t'‘is^,s*g){v) is called the value of the game starting from position v, and will be denoted 
by pg(v), or simply by /i(z;) if the game is clear from the context. The value pg{v) may depend on 
V. The BWR-game G = {G,p, r) is called ergodic if the value iJ.{v) is the same for all initial positions 
vGV. 

2.3 Potential transformations and canonical forms 

Given a BWR-game G = {G,p, r), let us introduce a mapping x : F —>■ R, whose values x{v) will be 

called potentials, and define the transformed reward function : A —>■ K as: 

rx(v,u) = r(v,u) + x{v) — x(u), where (v,u)gE. (4) 

It is not difficult to verify that the two normal forms of the obtained game G^ and the original 

game G, are the same, and hence the games G^ and G are equivalent (see |BEGMl3a] l. In particular, 
their optimal (pure stationary) strategies coincide, and their value functions also coincide: pg^ = pg. 
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It is known that for BW-games there exists a potential transformation such that, in the obtained 
game the locally optimal strategies are globally optimal, and hence, the value and optimal strategies 
become obvious [GKK88) . This result was extended for the more general class of BWR-games in 
[BEGMlSa] : in the transformed game, the equilibrium value /ig (v) = /rg* (v) is given simply by the 
maximum local reward for v € Viv, the minimum local reward for v € Vb, and the average local 
reward for v € Vr. In this case we say that the transformed game is in canonical form. To dehne this 
more formally, let us use the following notation throughout this section: Given functions /:£!—>■]& 
and g : G —>■ R, we define the functions M\f],M[g\ : G —>■ K. 

{ max„|(.„_„)gR f{v, u), for v G Vw, 

T^in^l(v,u)eE f(v,u), for u G Vb, 

foruGVR. 

{ n^ax^l(v,u)eEff(u), for v G Vw, 

^in^l(v,u)eEff(u), for v G Vb, 

Enl(v,n)eEP(^’ «) 5(w), for V G Vr. 

We say that a BWR-game Q is in canonical form if there exist vectors /i, x G such that 
(Cl) fj. = M[/r] = M[rx] and, 

(C2) for every v G Vw U Vr, every move (v,u) G E such that /i(u) = rx{v,u) must also have 
^,{v) = /i(u), or in other words, every locally optimal move {v, u) is globally optimal. 

Canonical forms were defined for BW-games in [GKK88j , and extended to BWR-games and other 
more classes of stochastic games in [BEGMl^ . It was shown in [GKK88] that there always exists a 
potential transformation x such that the optimal local rewards M[rx\{v) in a BW-game are equal to 
the game’s value ^(u) at each vertex v. This result was extended in |BEGM13a] to the BWR-case. 

Theorem 2 I [BEGM13aj I For each BWR-game Q, there is a potential transformation m bring¬ 
ing Q to canonical form. Furthermore, in a game in canonical form we have fig = M[r]. 

In this paper we will provide an algorithm for finding such a potential transformation in the 
ergodic case. 

Proposition 1 If there exists a constant m such that M[r] = m for all v G V, then (i) every locally 
optimal strategy is optimal and (ii) the game is ergodic: m = g.{v) is its value for every initial 
position V G V. 

Proof Indeed, if White (Black) applies a locally optimal strategy then after every own move 
(s)he will get (pay) m, while for each move of the opponent the local reward will be at least (at 
most) m, and finally, for each random position the expected local reward is m. Thus, every locally 
optimal strategy of a player is optimal. Furthermore, if both players choose their optimal strategies 
then the expected local reward E[r(ei)] equals m for every step i. Hence, the value of the game 
limT^oo 5 ^ Sfco equals m. □ 

2.4 Sufficient conditions for ergodicity of BWR-games 

A digraph G = {V = Vw U Vb U Vr,E) is called ergodic if all BWR-games Q = [G,p,r) on G 
are ergodic. We will give a simple characterization of ergodic digraphs, which, obviously, provides 
a sufficient condition for ergodicity of BWR-games. For BW-games (that is, in case R — %) such 
characterization was given in [GL89] . 

In addition to partition V = VwUVbUVr, let us consider another partition H : V = V+UV“UV° 
with the following properties: 

(i) Sets V+ and V~ are not empty (while V° might be empty). 
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(ii) There is no arc (v, u) G E such that either v G (VVUV/j)ny“ and u ^ V~, or n G (VBUyR)nV^+ 

and u ^ In other words, White cannot leave V~, Black cannot leave V~^, and there are 
no random moves leaving V'^ or 

(iii) For each v G Vw FI (respectively, v G Vb fl V~) there is a move {v,u) G E such that 
u G V~^ (respectively, u G V~). In other words. White (Black) cannot be forced to leave 
V~^ (respectively, V~). 

In particular, the properties above imply that the induced subgraphs G[1F+] and G[y“] have no 
terminal vertices. 

A partition H : V = 1F+ U V~ U satisfying (i), (ii), and (iii) will be called a contra-ergodic 
partition for digraph G = {Vw UVbI-)Vr,E). 

Theorem 3 A digraph G is ergodic iff it has no contra-erg odic partition. 

Proof “Only if part”. Let V : V = U V~ U be a contra-ergodic partition of G. Let us assign 
arbitrary positive probabilities to random moves such that | „)g_EP(^)'^) = 1 ^ G ^R- 

We still have to assign a local reward r{v,u) to each move {v,u) G E. Let us define r{v,u) = 1 
whenever v,u G , r{v,u) = — I whenever v,u G V~ , and r{v,u) = 0 otherwise. Clearly, if the 
initial position is in T+ (respectively, in V~ ) then the value of the obtained game is I (respectively, 
— I). Hence, the corresponding game is not ergodic. 

“If part”. Given a non-ergodic BWR-game Q = {G,P,r), the value function iig is not constant. 
Let iiw and fXB denote the maximum and minimum values, respectively. Then, let us set V~^ = 
{v GV \ p,{v) = nw}, V~ = {v GV \ fj.{v) = /rs}, and = T \ (H+ U V~). It is not difficult to 
verify that the obtained partition H : V = V~^ U V~ U is contra-ergodic. □ 

The “only if part” can be strengthened as follows. 

A contra-ergodic decomposition of 1/ is a contra-ergodic partition II : V = V~^ U V~ U such 
that M[r]{v) > M[r]{u) for every v G V~^ and u G V~. 

Proposition 2 Given a BWR-game Q whose graph has a contra-ergodic partition, if M[r]{v) > 
M[r]{u) for every v G V~^,u G V~ then p,{v) > p,{u) for every v G V~^,u G V~. In particular, Q is 
not ergodic. 

Proof Let us choose a number p. such that M[r]{v) > /i > M[r]{u) for every v G V~^ and u G V~; it 
exists, because set V of positions is finite. Obviously, properties (i), (ii), and (iii) imply that White 
(Black) can guarantee more (less) that p. for every initial position v G V~^ (respectively, v G V~). 
Hence, pl{v) > pL> p,{u) for every v G V~^ and u G V~. □ 

For example, no contra-ergodic partition can exist if G = (Vw U Vb U Vii,E) is a complete 
tripartite digraph. 


3 Ergodic BWR-games 

3.1 Description of the pumping algorithm 

Given a BWR-game G = {G,p,r), let us compute m{v) = M[r]{v) for all v G V. The algorithm 
proceeds in phases. In each phase we iteratively change the potentials such that either the range 
of m is reduced by a factor of 3/4, or we discover a contra-ergodic partition. The starting local 
reward function in the next phase is obtained by transforming the initial rewards by the potential 
function obtained in the current phase. Once the range becomes small enough we stop repeating 
these phases, which allows us to conclude that the game is ergodic. 

Now we describe informally the steps in one phase. The general step of a phase, called pumping, 
consists of reducing all potentials of the positions with to- values in the upper half of the range by the 
same constant (5; we say in this case that those positions are pumped. It will be not difficult to show 
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that the m-values of all positions in this upper half can only decrease and by at most <5, while the 
m-values of all other positions can only increase and also by at most 6. In each step we will choose 
<5 as the largest constant such that no m-value leaves the middle half (that is, second and third 
quarter) of the range. It can happen that 6 = +oo; in this case, we can construct a contra-ergodic 
decomposition proving that the game is non-ergodic. 

Next, we give a more formal description of phase h = 0,1,... Let us denote by the potential at 
the end of phase h— 1, where we set x^ = 0. Given a function / : 1/ —>■ R, let us denote by /+ and f~ 
its maximum and minimum values respectively. Throughout, we will denote by [m] = [m~, m^] and 
[r] = [r“, r+J the range of functions m and r, respectively, and let M = — m~ and R = r~^ —r~. 

Given a potential function x : G —>■ R, we will denote by m^,, M^, etc., the functions m, M, etc., 
with r replaced by the transformed reward in the above definitions. Given a subset I C [m], let 
14(1) = {v & V \ mx{v) G /} C I/. In the following algorithm, set I will always be a closed or 
semi-closed interval within [m] (and not within [rux])- For convenience, we will write {■)xi‘ as {■)h, 
where (•) could be m, r, r’*', etc (e.g., m'^ = 

Let mj^ = to < ti < t 2 < ts < t 4 = be thresholds defined by 

ti=m'^ + -Mh, i = 0,1, 2, 3,4, where Mh = . (5) 

Let us introduce x{v) = x(v) — S for v G I4[t2,G] and x(v) = x(v) for v G Vx[to,t 2 )- Let us then 
introduce the notation for rrix. This notation may look complicated but is necessary since rrix 
depends on x and d in a complicated way. 

It is clear that <5 can be computed in linear time, and we have mf(n) > ti for all v G I4[i2,G] 
and TO^(n) < for all v G 14[^o42), where m^(v) is the new value of mxiy) after all potentials in 
14(^2, G] have been reduced by 5. The value of is given by the following formula: 


max < max {rx{v,u)}, max {ra;(t;,«)} — <5 > for a € Vw n I4[t2, G], 

1 uGVx[t2’^4] J 


min < min {rx(v,u)}, min {rx(v,u)}-S} for v G Vb C]Vx[t 2 ,U], 

^ we Vx [t2 1*4] [*0 ’* 2 ) } 

y~^ p{v,u)rx{v,u) - 5 ^ piv,u) for v G VRnVx[t2,t4], 


(«,«)eE, 

u£V 


(v,u)€E, 
[*0 ’*2) 


max < 

max 

{Tx 

(a,M)} -t-d, 

max 

{Tx 

(«,«)} 


1 (v,u)GE, 

(v,u)GE, 

1 

^ wG Vx [*2 1 * 4 ] 



[to ’*2 ) 



min < 

i 

mm 

{Tx 

(a,u)} -l-(5, 

min 

{Tx 

(a,u)} 


1 (v,u)€E, 

(v,u)€E, 

1 

^ uGVx [*2 1 * 4 ] 



^^^Vx[tQ,t2) 




(v,u)^E, 

wev 


(v,u)GE, 
u,e Vx [t2 1*4] 


for 


for 


y~( p{v,u)rxiv,u) + S ^ p{v,u) for 


V G Vw n Vx[to, t 2 ), 


V G Vb n Vx [to, ^2), 


V G Vr n i 4 [to,^2). 


Note that jmf(n) — mx{v)\ < 5. It is also important to note that 5 > Mhl^f. Indeed, the m-values 
of positions from 14 (^ 2 ,4] cannot increase, while those of positions from Vx[to,t 2 ) cannot decrease. 
Each of them would have to traverse a distance of at least Mh/4: before it can reach the border of 
the interval I4[ti, ta]. Moreover, if after some iteration one of the sets 14[io, 4) or 14(^3 44] becomes 
empty then the range of rUx is reduced at least by 25%. 

Procedure PUMP(C/,e) below tries to reduce any BWR-game Q hy a potential transforma¬ 
tion X into one in which Mx < e. Two subroutines are used in the procedure. REDUCE- 
POTENTIALS(C/, x) replaces the current potential x with another potential with a sufficiently small 
norm; see Lemma S] in Subsection 13.41 This reduction is needed since without it the potentials and, 
hence, the transformed local rewards too, may grow exponentially; see the analysis of Rh and Nr 


in the proof of Lemma [S] The second routine FIND-PARTITION(5, a:) uses the current potential 
vector X to construct a contra-ergodic decomposition of Q (cf. line [T51 of the algorithm below), see 
Subsection 13.31 for the details. 

Note the algorithm can terminate in 3 ways: 


1. <5 = +00 in some iteration. In this case, = Vx[to,t 2 ), V = I4[t2,f4], = 0) is a 

contra-ergodic partition as can be checked from the definition of m^. 

2. The number of pumping iterations performed in some phase h reaches 


Nh 


Sn^RhD'^ 


+ 1 ) 


( 6 ) 


where Rh = f'h ~ ^^e range of mh is not reduced. In this case, we can find a 

contra-ergodic decomposition by the second subroutine FIND-PARTITION(C/, a;); see Lemma 

131 

3. Mx < e and e satisfies ([5]). In this case we can prove that a pair of locally optimal strategies 
with respect to Vx is optimal in the game Q\ we show this in Section [3.51 


A simple example illustrating how the pumping algorithm works is given below. 
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-4 
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Figure 1: An example with showing two iterations with the pumping algorithm. 


Let us consider the example shown in the left of Figure [TJ Positions W and B are controlled by 
the maximizer and minimizer, respectively. Position R is a random position with probability ^ on 
both outgoing arcs. 

We assume that the potentials, shown next to the vertices in Figure [ll initially are xw = Xb = 
xr = 0 . 

In the first h = 0 iteration of the pumping algorithm we have mx{W) = 4, mx{B) = —4 and 
mx{R) = 0. The extremal arc from the black and white vertices are shown as thick lines in Figure 
[TJ Thus we have Mq = 8 and hence <o = —4, ti = —2, t 2 = 0, = 2 and = 4. We get 

14 (^ 2 ,^ 4 ] = {W,R\ and as a maximal pumping move 5 = 4. Thus we update xw = —4, xr = —4 
and leave xb = 0 unchanged. The middle part in Figure |T] shows the resulting graph, with the 
updated local rewards. Since I4[to4i) = 0i we need to recompute the range. Therefore, in the 
next h = 1 step we get mx(W) = 3, mx{B) = 0 and mx{R) = —2, yielding Mi = 5, and to = —2, 
= — |, t 2 = 5 , ^3 = |, and ti = 3. Thus 14[ 4 ,^ 4 ] = {W}, and the maximal pumping distance 
is (5 = |. Consequently, we update the potential xw = leave the other two potentials 

unchanged. The right part in Figure |T] shows the resulting graph. Since now Vx(t 3 ,t 4 \ = 0, we 
will again recompute the range parameters. We get mx{W) = |, mx{B) = 1, and mx{R) = —|. 
Hence M 2 = ^ ■ We leave it to the reader to follow the pumping algorithm on this small illustrative 
example. 


To describe the main idea of the analysis, we will first argue in Section that the algorithm 
terminates in finite time if the considered BWR-game is ergodic, even if we set No = -boo. In the 
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Algorithm 1 PUMP(C/, e) 

Input: A BWR-game Q = {G = {V, E),P, r) and a desired accuracy £ 

Output: Either a potential a: : E —>■ R s.t. \mx{v) — mx{u)\ < e for all u,v G V, oi a contra-ergodic 
decomposition 

1 ; let h := 0; X = := 0; j := 1 

2 : let to,ti, ..., ti, and be as given by m and (| 6 ]) 

3: while i < Nh do 
4: if Mx < £ then 

5: return x 

6; end if 

7; 5 := max{i 5 '| (n) > G for all v G 14 (^ 2 , ^4] and (v) < for all v G 14 [to 42 )} 

8: if 5 = 00 then 

9: return the contra-ergodic partition (E+ = 14 [io 42 ), V~ = 14 [t 244 ], = 0) 

10: end if 

11: x{v) := x(v) — S for all v G Vx[t2,t4] 

12: if Vx[to,ti) = 0 or 14 (^ 344 ] = 0 then 

13: x^+^ := a::=REDUCE-POTENTIALS(C;, x)] h = h + h, i := 1 

14: recompute the thresholds to4i) • ■ • > ^4 and using m and (11]) 

15: else 

16: i := i + 1] 

17: end if 

18: end while 

19: (E+, E", E°):=EIND-PARTITION(a, x) 

20 : return the contra-ergodic partition (E+, E°) 


following section, this argument will be made quantitative with the precise bound on the running 
time. In Section rS.Bl we will show that our bound on the running time is tight since we can construct 
a weighted Markov chain (that is, an R-game) providing an exponential lower bound in k. 

3.2 Proof of finiteness for the ergodic case 

To simplify notation, we can assume without loss of generality (by shifting and scaling the local 
rewards) that the range of m is [0,1], that is, 4 = j, for i = 0,... ,4, and that the initial potential 
= 0. Suppose indirectly that phase 0 of the algorithm does not terminate, that is, E[0, \) and 
E(|, 1] never become empty and the m-range never gets smaller than 

Consider the infinite sequence of iterations and denote by V~ C V (respectively, by E+ C V) 
the set of vertices that were pumped just finitely many times (respectively, always but finitely many 
times); in other words, mx{v) G [ 5 ,1] if n G V~^ (respectively, mx{v) G [0, i) if u G V~) for all but 
finitely many iterations. By the above assumption, we have 


04mJ)CE- CE[0,i), 

(7) 

0 4P(J,l]CE+CE[i,l]. 

(8) 


Proposition 3 The partition II : V = V~^UV UE°, where E° = E\(E“''UE ), is a contra-ergodic 
decomposition. 

Proof First, let us check properties (i), (ii), (hi) of Section [2j4l First, (i) follows by © and dl]). Let 
us observe that the transformed local reward on any arc leaving V'^ (resp., V~) are = -l-oo (resp., 
is — 00 ). This implies (ii) and (hi). 

Finally, it follows from © and ([8]) also that mx{v) > ^ for all v G E+, while mx{v) < ^ for all 
V G V~. Hence the claim follows by Proposition [5] □ 
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In other words, our algorithm is finite for the ergodic BWR-games. Below we shall give an upper 
bound for the number of iterations a vertex can “oscillate” in [0, j) or (|, 1] before it finally enters 
[j, |] (to stay there forever). 

Remark 2 For general stochastic games (with imperfect information), the value might not exist 
\Gil57^ . Nevertheless, the pumping algorithm can be extended \BEGM14^ . This algorithm either 
(i) certifies that the game is 2Ae-ergodic, that is, it finds a potential transformation x such that all 
local values mxiv) are within an interval of length 24e, or (ii) presents a contra-ergodic partition 
similar to the one described in Section \K^ - in particular, we obtain two positions u and v such that 
\p{u) - p{v)\ > e. 

3.3 Finding a contra-ergodic decomposition: FIND-PARTITION(^, x) 

We assume throughout this section that we are inside phase h of the algorithm, which started with 
a potential x^, and we proceed to stepTIll For simplicity, we assume that the phase starts with 
local reward function r = rt and hencqj = 0. Given a potential vector x, we use the following 
notation [GKK88| : 


EXTa; = {{v,u) e E \ V e Vb U Vw and rx{v,u) = mx{v)}, 

and recall that x~ = min{a;(u) | v G V}. Let < 0 be the largest value satisfying the following 
conditions: 

(i) there are no arcs {v, u) G E with v G Vw U Vr, x(v) > U and x{u) < tp, 

(ii) there are no arcs {v,u) G EXT^, with v G Vb, x(v) > ti and x{u) < ti. 

Let X = {v G V \ x(v) > ti}. In words, X is the set of positions with potential as close to 0 as 
possible, such that no white or random position in X has an arc crossing to V \ X, and no black 

position has an extremal arc crossing to F \ X. Similarly, define > x~ to be the smallest value 

satisfying the following conditions: 

(iii) there are no arcs {v, u) G E with v gVb G Vr, x{v) < tu and x{u) > 

(iv) there are no arcs {v,u) G EXT^, with v G Vw, x(v) < tu and x(u) > tu, 

and let y = {?; G F | x{v) < t„}. Note that both ti and tu trivially exist. Note also that the sets X 
and Y can be computed in 0(|F| log |F| + |E|) time. 

Lemma 1 It holds that ma,x{—ti,tu — a:”} < nRhD^. 

To prove Lemma [1] we need the following lemma. 

Lemma 2 Consider any move {v, u) G E and let x be the current potential. Then 

, , J x{v) — {m)( — rf) if either {v G Vw and (v,u) G E) or (v G Vb and {v,u) G EXT^,) 

x{u) _ ^ [a;(r)) — — rf)] if v G Vr and {v,u) G E, 

and 

f \ ^ { 2 :(u) + r)( — mf if either {v G Vb and {v, u) G E) or {v G Vw and {v, u) G EXT^,) 

^ ^ ~ \ D [a;(z)) +r() — mf — (I — if v G Vr and (v,u) G E. 

Proof We only consider the case for v G Vr, as the other claims are obvious from the definitions. 

For the first claim, assume that x(v) > x(u), since otherwise there is nothing to prove. Then from 

^in particular, note that rx{v,u) and mx{v) are used, for simplicity of notation, to actually mean r^_^_^h{v,u) and 
respectively. 
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Tnx{v) < it follows that 

> ml-^p(v,u')rh{v,u') 

u' 

> p{v, u)(x{v) — x(u)) + p(v, u')(x(v) — x(u')) 

u'^u 

> jjix{v)-x{u))+x{v){l-^), 

where the last inequality follows from the non-positivity of x. This proves the first claim. The other 
claim can be proved by a similar argument (by replacing x(-) by x~ —x(-) and ml — by ^l — 

□ 

Proof of Lemma [Ij By definition of X, for every position v G X there must exist (not necessarily 
distinct) positions vo,vi, ... ,V2j = v G X, j < |X|, such that x(vo) = 0, and for i = 
x(v2i) > x(v2i-i), and either ((v2i-2, V2i-i) G E and V2i-2 G Vw U Vr) or ((w 2 i- 2 , W 2 i-i) G EXT^; 
and V2i-2 G Vb)- Among the even-numbered positions, let U2q-2, ■ • ■ ,V2ii-2 be the ones belonging 
to Vr, and assume withut loss of generality that I > 0 and ii < 12 < ■ ■ ■ < ii- Using Lemma [H we 
obtain the following inequality by a telescoping sum: 

a;(w 2 *,+i- 2 ) > x(v 2 i,,-i) - {iq+i - iq - ^)iml -r'^), ioY q = I,...,I-I, (9) 

and x{v 2 i^- 2 ) > -(U - ^){ml - r^). 

Now applying Lemma [2] to the pair W 2 i ,-2 G Vr and i; 2 i,-i, for q — 1,... ,l — 1, and using (|9]) 
we obtain: 


Xq+i>Dxq-{D + iq+i-iq-l){ml-r^), > - (U “ 1) (m+- ), (10) 

where we write, for convenience, Xq = x(u 2 ig- 2 ), for q = 1,. .. ,1. Iterating, we obtain: 

xi>- - 1) + ^ D‘-‘>{D + iq- iq_i - 1)^ (m+ - r"). 

Combining this with the inequality x(v) > Dxi — {D + j — ii){ml — and using U > 1, we get 

x{v) > -D^j{ml -rj;)> -D'^\X\{ml - r^). 

Similarly, one can prove for any v G Y that x{v) < -I- D^\Y\{rl — m'^), and the lemma follows. 

□ 


The correctness of the algorithm follows from the following lemma. 

Lemma 3 Suppose that pumping is performed for Nh > 2nTh + 1 iterations, where Tj, = , 

and neither the set I4[to,ti) nor 14 (^ 3 ,^ 4 ] becomes empty. Let V~ = X and V'^ = Y be the sets 
constructed as above, and V^ = U \ (A UU). Then V^ UV~ UU° is a contra-ergodic decomposition. 

Proof We pump in each iteration by 5 > ■^. Furthermore, our formula for <5 implies that once 
a vertex enters the region t4[ti,t3], it never leaves this region. In particular, there are vertices 
Vo G X r\Vx[to,ti) and VnGY fl 14(^3,^ 4 ] with x{vo) = 0 and x{vn) = x~. 

For a vertex v G V, let N(v) denote the number of times the vertex was pumped. Then N(vq) = 0 
and N{vn) = Nh. 

We claim that N{v) < Th for any v G X, and N{v) > Nh — Th for any v G Y (i.e., every vertex in 
X was not pumped in all steps but at most Th, and every vertex in Y was pumped in all steps but at 
most Th). Indeed, if u G A (respectively, v GY) was pumped greater than (respectively, less than) 
Th times then x{v) — x{vo) < —nRhD^ (respectively, x{vn) — x{v) < —nRhD^), in contradiction to 
Lemma [TJ 
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Since Nh > ‘2Th, it follows that XdY = 0. Furthermore, among the first 2nTh + l iterations, in at 
most nTh iterations some vertex v € X was pumped, and in at most nTh iterations some vertex in Y 
was not pumped. Thus, there must exist an iteration at which every vertex v € X was not pumped 
and every vertex v €Y was pumped. At that particular iteration, we must have X C Vx[to,t2) and 
Y C Vx[t2,t4], and hence mx{v) < t2 for every v € X and ruxiv) > t2 for every v € Y. By the 
way the sets X and Y were constructed, we can easily see that X and Y will continue to have this 
property till the end of the iterations, and hence they induce a contra-ergodic partition. The 
lemma follows. □ 


3.4 Potential reduction: REDUCE-POTENTIALS(^, x) 

One problem that arises during the pumping procedure is that the potentials can increase expo¬ 
nentially in the number of phases, making our bounds on the number of iterations per phase also 
exponential in n. For the BW-case Pisaruk [Pis99] solved this problem by giving a procedure that 
reduces the range of the potentials after each round, while keeping all its desired properties needed 
for the running time analysis. 

Pisaruk’s potential reduction procedure can be thought of as a combinatorial procedure for 
finding an extreme point of a polyhedron, given a point in it. Indeed, given a BWR-game and a 
potential x, let us assume without loss of generality, by shifting the potentials if necessary, that 
x > 0, and let E' = {{v,u) € E \ rx{v,u) G [m~,m^], x G Fb U Vw}, where r is the original local 
reward function. Then the following polyhedron 


/ 

mC < r{v, u) x'{v) — x'(u) < mt , 

\l{v,u) G E' 

\ 


r(v, u) -|- x'{v) — x'{u) < mt, 

Vv G Vw, {v, u] 

GE\E' 

X G E'" 

xrix < r{v, u) x'{v) — x'{u), 

Vv gVb, {v, u) 

ge\e' 


+ -®'(w)) < mt, 

Vv G Vr 



x' {v) > 0 

VveV 



is non-empty, since x G F^,. Moreover, F^, is pointed, and hence, it must have an extreme point. Let 
us remark that given a feasible point x, an extreme point can be computed in 0{n'^\E\) time (see, 
e.g., |Sch03j l. 

Lemma 4 Consider a BWR-game in which all rewards are integral with range R = r~^ — r~, and 
probabilities p{v,u) are rational with common denominator D, and let k = |Vb|. Then any extreme 
point x* ofTx satisfies ||x*||oo < nRk{2DY. 

Proof Consider such an extreme point x*. Then x* is uniquely determined by a system of n 
linearly independent equations chosen from the given inequalities. Thus there exist subsets V C F, 
Vr F Vr and E” C E such that |F'| -I- \Vfi\ \E''\ = n, x* is the unique solution of the subsystem 
x'{v) = 0 for all v G F', x'{v) — x'{u) = m% — r{v,u) for {v,u) G E", and x'(t>) —^^gyp(x,M)x'(M) = 
p{v,u)r{v,u) for v G Vfi, where m* stands for either or m+. 

Note that all variables x'{v) must appear in this subsystem, and that the underlying undirected 
graph of the digraph G' = (F, E") must be a forest (otherwise the subsystem does not uniquely fix 
X*, or it is not linearly independent). 

Consider first the case Fb = 0. For i > 0, let F be the set of vertices of F at (undirected) distance 
i from V' (observe that i is finite for every vertex). Then we claim by induction on i that x* (x) < fo 
for all u G F, where 7 = max{m^ — r“,r“'" — m^}. This is trivially true for i = 0. So let us assume 
that it is also true for some i > 0. For any v G F+i, there must exist either an arc {v,u) or an arc 
(rt, v) where u G F-i- In the former case, we have x*{v) = x*{u) m* — r{v, u) < i^ — r~ < 

{i + 1 ) 7 . In the latter case, we have x*{v) = x*(u) — (m* — r(u, v)) < *7 -I- r+ — m~ < (* + 1 ) 7 . 

Now suppose that |Fb| > 0. For each connected component Di in the forest G', let us fix a 
position ui from V' if fl F' F 0; otherwise, vi is chosen arbitrarily. For every position v G Di let 
Vv be a (not necessary directed) path from v to vi. Thus, we can write x'{v) uniquely as 
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( 11 ) 


x'{v) = x'{vi) + + £y^ 2 m^ + ^ £v,u> ,u"r{u',u''), 

(u' ,u")^'Pv 

for some tv,i,£v ,2 G and £v,u',u" S {—1, !}• Thus if x*{vi) = 0 for some component idj, then by a 
similar argument as above, x*(v) < 7 I-D/I for every v G Di. 

Note that, up to this point, we have used all equations corresponding to arcs in G' and to vertices 
in V. The remaining set of |V^| equations should uniquely determine the values of the variables 
in any component which has no position in Y'. Substituting the values of x'{v) from (jllll . for 
the positions in any such component, we end-up with a linearly independent system on k' — \V^\ 
variables Ax = b, where A is a k' x k' matrix in which eatch entry is at most 1 in absolute value 
and the sum of each row is at most 2 in absolute value, and ||&|loo < n{R + M^) < 2nR. 

The rest of the argument follows (in a standard way) by Cramer’s rule. Indeed, the value of 
each component in the solution is given by A'/A, where A is the determinant of A and A' is the 
determinant of a matrix obtained by replacing one column of A by b. We upper bound A' by 
k'\\b\\oo£^max, where Amax is the maximum absolute value of a subdeterminant of A of size k' — 1. 
To bound Amax, let us consider such a subdeterminant with rows oi,..., aw-i, and use Hadamard’s 
inequality: 

fc'-i 

A'< n l|adl< 2 "'-\ 

since ||ai||i < 2, for all i. To lower bound A, we note that A is a non-zero integer, and hence 
has absolute value at least 1. Combining the above inequalities, the lemma follows. □ 

Note that any point x' G Ta, satisfies Mx' C Mx, and hence replacing x by x* does not increase 
the range of m^,. 

3.5 Running time analysis 

From Lemmas [ 3 ] and m we can conclude the following bound. 

Lemma 5 Procedure PUMP(Q,e) terminates in 0{n‘^\E\R{nk{2D‘^)'^^ -l-logi?) time. 

Proof We note the following: 

1. By ([6]), the number of iterations per phase h is at most Nh = 

2. Each iteration requires 0{\E\) time, and the end of a phase we need additional 0{n'^\E\) time 
(which is required for REDUCE-POTENTIALS). 

3. By LemmalU for any {v, u) G E, we have rx{v, u) = r(v, u)+x(v) — x(u) < r(v, u)+2nk(2D)^R, 
and similarly, rx(v,u) > r(v,u) — 2nk(2D)^R. In particular, i?h < (I -f ‘ink{2D)^)R at the 
beginning of each phase h in the procedure. 

Since < jMh_i for h = 1,2,..., the maximum number of such phases until we reach the 
required accuracy is at most E[ = log 4/3 {^)- Putting all the above together, we get that the total 
running time is at most 

Noting that Mq < R and Mh > e, the lemma follows. □ 

We now give an upper bound on the required accuracy e. 

Lemma 6 Consider a Markov chain j\4 = (G = (V,E),P) with n positions among which k are 
random and in which all the entries of the transition matrix P are rational numbers with common 
denominator D. Assume that M has only one absorbing class. Let p* be the limiting distribution 
starting form any position. Then p*(v), for all positions v, are rational numbers with common 
denominator at most {k -\- l)n{2D)^'^^. 


Sn^Rh.P'’ I 1 

Mh I ^ 
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Proof For v G V, let TZ{v) be the set of positions that can reach v\yy & directed path in G, all whose 
internal positions (if any) are in Vb^Vw- Note that 'R,{v)C\Vii ^ 0 for each v G Vb^Vw- Otherwise, 
there is a position n G Vb 0 Vw such that no random position in V can reach v, implying by the 
strong connectivity of G that Vr = 0, and hence Ai must be a Hamiltonian cycle on deterministic 
positions with p*{v) = ^ for all v € V. 

Consider the system of equations in (L5) defining p*{v): 

P*{u) + '^ p{u,v)p*{u), 

u^VbUVw‘{u,v)^E u^Vr 


for vGV, and Y.vevP*i'^) = 1- 

Eliminating the variables p*{v) for u G Vb U Vw- 

P*i'^)= Pi'>^A)p*{u), (12) 

u^TZ{v)r\VR 

where p'{u,v) = p{u,v) + J2u'£'R{v)n(VBUVw)P^'‘^’'^''^’ end-up with a system on only random 
positions v G Vr: p*{v) = Eu^VrP'^'^a)p*{'^)- (Note that J2u£VrP '^ 
Similarly, we can reduce the normalization equation to Ev&Vr (1+E u^VbOVw'-v^'R.{u) p'{v,u))p*{v) = 
1. This gives a system on k variables of the form (p*)^(/ — P') = 0, {p*)'^b = 1, where the matrix 
P' and the vector b have rational entries with common denominator _D, each row of P' sums up to 
1, and bi is rational number G [l,n] with denominator D. 

Let us multiply by D all the equations of this system and note that all coefficients of the resulting 
system of equations are integers in [—D,D\ for the first k — 1 equations, and in [D,nD] for the 
normalization equation. Any non-zero component p*(y) in the solution of this system takes the 
form A ’ Ao, Ai,..., are subdeterminants of D(I — P') of rank k — 1. It follows 

by Hadamard’s inequality that A^ < {2D)^, and hence DbiAi is an integer of value at most 

kn{2Df+‘^. 

After solving this system, we can get the value of p*{v), for v G Vw H Vb, from (IT^ as rational 
numbers of common denominator at most kn{2D)^^^ . □ 


Theorem 4 When procedure PUMP{Q^ e) is run with e as in (0), it either outputs a potential vector 
X such mx{v) is constant for all v G V, or finds a contra-ergodic partition. The total running time 
is poly(n)(2£>)'^(^^i?logi?. 

Proof Suppose that the game is not ergodic and let us fix an optimal situation (pair of optimal 
strategies) s. Then there must exist at least two absorbing classes in the obtained weighted Markov 
chain which have different values. Consider such an absorbing class and contract all arcs (y, u) where 
V G Vw U Vb- We obtain an absorbing Markov chain on a subset of Vr. We now can apply Lemma |6] 
to conclude that the value of any position in this class is a rational number with denominator at most 
(fc -I- l)n{2D)^^^. Consequently, the difference between any two different values of absorbing classes 
of Os is at least ((fc-|-l)^n^(2L))^^+®)“^. If PUMP(C/, e) terminates with < £ then we conclude by 
Lemma ini analogously to the previous arguments, that all mx(y) values are the same. This implies 
by Proposition [1] that this is the optimal value and the locally optimal moves are globally optimal. 
The running time bound follows now from Lemma [S] □ 


3.6 Lower bound for ergodic games 

Note that if IVrI = 0, there is a simple example with only one player showing that the running time 
of the pumping algorithm can be linear in R; see Figure!^ 

We show now that the execution time of the algorithm, in the worst case, can be exponential 
in the number of random positions fc, already for weighted Markov chains, that is, for R-games. 
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Figure 2: A lower bound example for pumping algorithm for B-games. In this example positions U 2 
and U 3 will oscillate 2R times before converging to value 0. Labels attached to the arcs represent 
rewards. 


Consider the following example; see Figure [3] Let G = {V,E) be a digraph on k = 21 + 1 vertices 
ui,... ,ui,uo = uo,ui,...,u/, and with the following set of arcs: 

E = {{ui,ui), {vi,vi)} U {vi,v^-l) \ i = 


Let > 2 be an integer. All positions are random with the following transition probabilities: 
p{ui,ui) = p{vi,vi) = 1 - -L, p{uo,ui) = p{uo,Vi) = i, p{ui_i,Ui) = p{v,_i,v,) = 1 - A, for 
i = 2,... ,1, and p{ui, Ui-i) = p{vi, Vi-i) = for i = 1,..., 1. The local rewards are zero on every 
arc, except for r{ui,ui) = —r{vi,vi) = 1 (See Figure[3]for I = 3). Clearly this Markov chain consists 
of a single recurrent class, and it is easy to verify that the limiting distribution p* is as follows: 


P*{uo) = 


D-2 


D{D-iy -2' 


P*{Ui) =P*{Vi) = 


2D{D -If -2 


for i = 1 ,..., L 


The optimal expected reward at each vertex is 

p(ui) = p(vi) = -1 • (1 - ^)p*{ui) + 1 • (1 - ^)p*{ui) = 0 , 


for i = 0,... ,1. Up to a shift, there is a unique set of potentials x that transform the Markov chain 
into the canonical form, and they satisfy the following system of equations: 


0 

0 

0 

0 

0 


1a 1a/ 

—Ai-A'l, 

2 2 

-(1 - for i = 1 ,.. 

-( 1 - 1 )A'+i + 1 a', for * = !,.. 


k-l 

k-1 


where A^ = x{ui) — x{ui-i) and A' = x(vi) — x(vi-i); by solving the system, we get A^ = —A' = 
(D- for i = 

Lower bound on pumping algorithms. Any pumping algorithm that starts with 0 poten¬ 
tials and modifies the potentials in each iteration by at most 7 will not have a number of iter¬ 
ations less than — on the above example. In particular, the algorithm in Section |3] has 
7 < l/min{p(u,u) | (v,u) G E, p{v,u) ^ 0}, which is Q,{D) in our example. We conclude that the 
running time of the algorithm is on this example. 
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Figure 3: An exponential time example. Labels attached to the arcs represent transition probabili¬ 
ties. Local rewards on all arcs are 0 except on the two loops, where U 3 ) = 1 and r(u 3 , V 3 ) = —1. 


4 Non-ergodic BWR-games 

The main difficulty in solving non-ergodic BWR-games is the fact that random positions may in¬ 
troduce transient classes. In fact, our algorithm developed for the ergodic case can be used to find 
the top/bottom classes, and if no random node enters these classes from outside, then we could call 
the same procedure (called Find-Top(-)/Find-Bottom(-)) recursively for the rest of the positions and 
find an optimum solution. However, this is not always the case. To handle this we need to introduce 
parametrized games. 

4.1 Preliminary results and basic lemmas 

Let G = {G = {V = Vb U Vw U Vr, E),p,r) be a BWR-game. In what follows we will use the 
following notation. For a 0 £ K, let S{9) ■.= {v € V \ fJ.g{v) = 9}. If S{9) ^ 0, we refer to it as 
an ergodic class. If S{9) = V the game G is said to be ergodic. Every BWR-game G has a unique 
sequence of numbers 9\ < 9<i < ■ ■ ■ < 9t, such that S{9^ ^ 0 for all i G [P\ and Ui=i S{9i) = V. 

Proposition 4 Ergodic classes necessarily satisfy the following properties. 

(i) There exists no arc {v,u) £ E such that v £ Vw H S{9i), u £ S{9j), and j > i; 

(ii) there exists no arc (v,u) £ E such that v gVb r\ S{9i), u £ S{9j), and j < i; 

(Hi) for every v £ Vw H S{9i), there exists an arc {v,u) £ E such that u £ S{9i); 

(iv) for every u £ Vb fl S{9i), there exists an arc (u,u) £ E such that u £ S{9i); 

(v) there exists no arc (v,u) £ E such that u £ Vb fl S{9i), and u ^ S{9i); 

(vi) there exists no arc (v,u) £ E such that u £ Vb H S{9i), and u ^ S{9i). 

Proof All claims follow from the existence of canonical form for G by Theorem[2l since the existence 
of arcs forbidden by (i), (ii), (v), or (vi), or the non-existence of arcs required by (hi) and (iv), would 
violate the value equations (Cl). □ 

For a set of positions S' C V, we define the black closure c1b(S) (respectively, the black semi¬ 
closure c1b(S)) of S to be the set of positions which is recursively obtained from S by adding 

(1) a position v £ Vb (respectively, u £ Vb U Vb), if some arc {v,u) £ E satisfies u £ S, or 

(2) a position v £ Vw U Vb (respectively, v £ Vw), if all arcs (u, u) £ E satisfy u £ S. 

In words, c1b(S) (respectively, c1b(S)) is the set of positions to which Black can force a move with 
probability 1 (respectively, with some positive probability). 

The white closure and semi-closure of S, clw(<5') and cl'i^(S'), are defined analogously. Asso¬ 
ciated with a black closure (respectively white closure clw(<S')) is a (partial) strategy sb(c1b(<S')) 
(respectively, sw(clw(<5'))) which guarantees Black (respectively, White) a move into S. Similar 
strategies are defined with respect to semi-closures. 
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Observation 1 If fJ.g{v) < z for all v G S then the same holds for all v G cls(>S'). Furthermore, if 
z = Qf. then the same holds for all v G 01 ^( 5 ') D cIb(S'). 

We now introduce parametrized games. 

To a BWR-game C/ and a set T C V such that every v G Viv UVb has some arc {v, u) with u GY, 
we associate the restriction Q\Y] obtained by deleting all positions in R \ T and all arcs {v, u) where 
V ^ Y or u ^Y. Note that G[Y] is not a BWR-game in general. Given such a restriction and a; G R, 
we further define a BWR-game G\Y]{x) and a BW-game G[Y]{x) as follows: 

G[Y]{x): we add a new deterministic position w = w{Y) with a self-loop (w, w) with local reward x, and 
for every v G Vr r\Y add an arc from v to w{Y) with local reward 0 and transition probability 
p{v, w{Y)) := 1 - Eugv ■“) if w{Y)) > 0; 

Q[Y]{x): we remove all arcs leaving v G Vr fl T and contract the set v G VrC\Y into a deterministic 
position w{Y) (black or white, arbitrarily) with a self-loop (w{Y),w{Y)) having local reward 
value X. 

We call such games parametrized BWR-/BW-games (with parameter x). 

For a BW-game G, let ^{G) be the number of distinct optimal values achieved by positions of G'■ 
For a BWR-game G, we extend this by defining ^{G) ■= maxycv, x^ 9 .^[G\Y]{x)). 

For a situation s = {sw,sb) such that s{v) G Y forallu G Y, we denote by s [V] := (srv[y], sr)!"]) 
the restriction of s on Y. For brevity in what follows, we will call (v,u) a black, white, or random 
arc, depending on whether v G Vb, v G Vw, or v G Vr, respectively. 

Lemma 7 (i) Given a BWR-game G = {G = {V = Vb G Vw U VR,E),P,r), let G be the BW- 

game obtained from G by replaeing each random position v gVr with a terminal deterministic 
position (black or white, arbitrarily) with a local reward pg{v) on the self-loop {v,v). Then 
Tgiv) = P-giv) for all v G V. 

(a) Let G be as above and U QV be such that pg{v) ^ pg{u) for all v G U and u G V \ U. Then 
for allvGU. 

Proof (i) By Theorem O there is a potential a; : V —M transforming G to canonical form and 
certifying for all positions v G V that the value of u in C/ is pg{v). It is obvious that the same 
potential gives a canonical form for G (namely, given by the canonical form equations for G, with 
the equations for the random positions dropped), and hence, by Theorem [5] certifies that the value 
of u in C/ is also pg{v). The proof of (ii) follows also form this argument, since U contains some 
complete ergodic classes of G- □ 


We will write V{0i) := S{di)U{w{S{0i))} and denote by 1 the vector of all ones with appropriate 
dimension. 

For i G [I], define G[0i] to be the game G[S{0i)]{0i). Proposition[3]guarantees that the game G[di] 
is well-defined, that is, for every v G S{0i) there is at least one arc going out of v in G[0i]- 

The following two lemmas state that if we identify ergodic classes together with their values, 
then we can find an optimal strategy in the whole game by solving each ergodic class independently. 

Lemma 8 For all i G [I] and v GV[0i), it holds that pg[ei]{v) = 0i. 

Proof Consider a potential transformation x : V —>■ K bringing G to canonical form. Let x' : 
V{0i) —>■ R be the vector of potentials defined as follows: x'{v) := x{v) if u G S{0i), and 


x'{w{S{0^))) 


x{v) 


1 

p{v,w{S{0,))) 


X! Pi'o,u){r{v,u) -\-x{v) 

\u&S{ei) 


x{u)) - 0i 


(13) 
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for V G Viir\S{9i). Then it is immediate that the pair (x', 9il) satisfies the canonical form conditions 
(Cl) and (C2) at any deterministic position v G V{9i). Furthermore for v gVrC\ S{9i), {x',9il) 
also satisfies (Cl) since, trivially, 9i = X]uGS(ei)-P(^ j + p{v^w{S(9i)))9i, and moreover, 

9i= ^ p{v,u){r{v,u) + x{v) - x{u)) + p(v,w(S(9i)))(0 + x(v) - x'(w(S(9i)))) 

ueSiSi) 


holds by ([13]). 


□ 


Lemma 9 For i G [i], let he a pair of optimal strategies in G[9i]. Then the situation 

s* = (s^,s|j) obtained by concatenating all these strategies together (that is, s(y{v) := for 

V G Vw n S{9i) and Sg{v) := Sg{v) for v gVbF\ S{9i)) is optimal in Q. 

Proof For a strategy s denote by Cs(n) the effective payoff (I3|) starting from position v in the 
Markov chain Gs obtained from G by fixing the arcs determined by s. The lemma follows from the 
following claims. 

Claim 1 Let s = (s^,sb) or s = {swjS*^), where sb G Sb and sw G Sw are arbitrary strategies 
of Black and White, respectively. Then any absorbing class U in the Markov chain C, satisfies: 
U CSm for some ^G[i]. 

Proof Without loss of generality, consider s = (s(y,SB) for some sb G Sb- Suppose that there 
an absorbing class U in Gs such that U S{9i) for all i G [T\. Let i be the largest index such that 
U n S{9i) ^ 0. By the strong connectivity of the subgraph induced by U in Gs, there must exist 
an arc {v,u) from some v G U (1 S{9i) to some u G U (1 S{9j) ^ 0, for some j < i. By Proposition 
|31-(ii), V ^ Vb, and by the choice of s, u ^ Vw- Then v GVr, and since U is absorbing, there are no 
arcs in G from v to some u ^ U, and in particular to no u G S{9j) with j > i. We get the following 
contradiction from (Cl): 


9i= ^ p{v,u)9i+ ^ p{v,u)pg{u) < 9i, 

ueSiBi) u^SiBi) 

since pgfu) < 9i for all u ^ S{9i). 


□ 


Claim 2 Ps*{v) = pg{v) for all v G V. 

Proof For non-transient positions, the claim follows from ClaimjTl which implies that any absorbing 
class of Gs* , included in S{9i), is also an absorbing class in the Markov chain {G[9i])B<- ■ In particular, 
the limiting distribution and hence the value of any position v in such an absorbing class are identical 
to the corresponding ones in {G[9i])sG that is, Ps*iv) = Pg[Bi]i'B)- By Lemma |8l we get pLs*iv) = 
9i = pLg{v). 

Consider now the set of transient positions T. Using the notation in Section ETTl let Ci,... ,Ch 
be the set of absorbing classes. Then it follows from (L3) and (L4) in Section I^IT] that Ps*[T] '■= 
(/is*(n) : V G T) is the unique solution of the equation Ax. = a, where A := I — Ps*[T;T], a := 
PiPs*[T;Ci]l, and pi is the value pg(v) for v G Ci. Note that this equation is the value 
equation given in condition (Cl) of the canonical form, where the value p(v) is set to pg(v) for all 
positions v in the absorbing classes. Since the vector x := {pg{v) : v G T) satisfies this equation, it 
is the unique solution, implying that Ps*{v) = Poir) for all v GT. □ 


Claim 3 Let s' = (siV, sb) and s" = (sw, s^), where sb G Sb and sw G Sw are arbitrary strategies 
of Black and White, respectively. Then pg^,(v) > pg(y) and pg^„{v) < pg(v). 

Proof Without loss of generality we only prove the claim for s'. Let ^ C U be the set of absorbing 
positions in Gs', and ^ G K'® be the vector of corresponding values. Then the vector of values for the 


19 


set of transient positions T := \ S' in Qg/ is given by the unique solution of the following equation 

in y 

y = ^ ( ^ ) = (14) 

where i? = [ B | Z? ] is a stochastic matrix with B := Ps/[T;T] and D := Ps/[T;S]. Similarly, the 
value equation for the set of positions T in Qg* is given by 

X = A ^ ^ ^ =Ax + Cr], (15) 

where A= [ A | C ], A := Pg. [T;T], D := Pg. [r;S], x £ MA and r] £ are the vector of values 
of the positions in T and S, respectively. 

By Claim [21 [ x | 77 ] = /rq satisfies (fT51) . and by Claim [T] we have r], since is an optimal 
Black strategy in Q\0j\ for all i. Note that A an P are identical on any row that corresponds to 
a position v £ Vw U Vr- Furthermore, Proposition |4l(ii) implies the following shifting property, for 
any v G Vb 


A{v,u) = 1 => B{v,u') = 1 for some u' such that pi.g{u') > pi.g{u), 


which in turn implies that A 


1 


< B 


< B 


or 


Ax + Crj < Bx + D^. (16) 

By (L3) in Section [2Al {I — B)~^ exists and is non-negative. Combining ([Mil . (IT5|) and (IT6l) . we get 

X < {I-B)-^Df = y. 


The claim follows. 


□ 


This completes the proof of Lemma [HI 


□ 


Remark 3 Lemma 0 states that, if we the know values of all the positions, then we can get uni¬ 
formly optimal strategies by solving i ergodic different games. It should be noted however that, even 
if we know those values and the corresponding ergodic classes, a pseudo-polynomial algorithm for 
the ergodic case, as the one described in the previous section, does not yield in general a pseudo¬ 
polynomial algorithm for solving the game. The reason is that in our reduction in the proof of Lemma 
0 we introduce local rewards on self-loops {v,v) of value frg{v), which might be exponentially small 
in n, even for games with a single random position; see, e.g., WEGMlSb^ . This is due to the fact 
that some random positions are transitional, and hence the precision estimate in Lemma\^does not 
apply. 

In view of the above remark, we need to analyze the structural dependence of the ergodic classes 
on the guessed values of the random positions. We achieve this by considering a parametrized 
BW-game as described in the next lemma. 

Lemma 10 Let v be a position in a parametrized BW-game Q(x), and x,y gR. 

(^) V Tg(,^){v) < X, then for any y > Tg(y)iv) = Tg(,^){v); 

(a) if p^g(,g)(v) > X, then for any y < lig(,,)(u), Tg;y){v) = 

(Hi) if = X, then for any y > x, y > Hg;y){v) > x; 

(iv) if fig= X, then for any y < x, y < Hg;y){v) < x. 
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Proof We prove (i) and (iii); (ii) and (iv) are analogous. 

(i) Suppose that < x and y > Let (s^, s*^) be an optimal strategy in G(x), and 

consider a strategy s := (s^, sb), for some sb S Sb- By optimality of in Gix), v either reaches in 
Gs{x) the self-loop {w, w), or reaches another cycle C of mean value (that is, J2eec ’’(s)/l^l) least 
/ig(^)(u). In both cases, since y > v reaches in Gs{y) a cycle of mean value at least 

i.e., Now consider a strategy s := (swj s^), for some sw G Sw- The optimality 

of s*B in G(x) implies that v does not reach the self-loop {w,w) in Gsix), since x > This 

implies that 

(iii) Suppose that = x < y. Let (s^, Sg) be an optimal strategy in Gix). In Giy), let us 

consider a strategy s := (s^jSb), for some sb G Sb- By optimality of s* in G(x), v either reaches 
the self-loop iw,w), or reaches in Gsix) another cycle of mean value at least y^^^^iv). This implies 
that, in Giy), we either have y^ (y)('*^) = V > x or y^ both cases, we get 

/rg(^)(u) > X. Similarly, if s := (svv,s|j), for some sw G Sw, then v either reaches in Gsix) the 
self-loop (w, w), or reaches another cycle of mean value at most yg(^^-^ix) = x < y. This implies that, 
„ §(„), we eithee have . oe ^ a,,.,(a) < m both caees, we get a,,,(a) ^ 


Corollary 1 For any position v G V of a parametrized BW-game Gix) there is an interval /(w) := 
[Ai(w), A 2 (n)], such that 

(G h-g{x)i'^) = >^2iv) if X> \2iv); 

(Hi) h-g(^^)iv) = X if X G liv). 

Proof Consider a position v G V. Then Ai(?;) := and A2(u) := y^^g.^iv) satisfy the 

conditions of the claim. Indeed, Lemma fTOl lil and (ii) imply, respectively, claims (i) and (ii) of the 
corollary. Moreover, for any x G liv), Claims (iii) and (iv) of Lemma [TO] imply respectively that 
Ai(?^) < < y and y < yg^^-^ < ^ 2 iv), and hence Claim (iii) of the corollary. □ 

Let Gix) be a parametrized BW-game with a self-loop of reward x. By Corollary[l] the end-points 
of the intervals liv), for v G V, partition the range [—i?, i?] into a set of at most 2viG) -I- 1 < 2n -I- 1 
intervals X :—XiGix)), such that the ’’structure” of the game with respect to the dependence of the 
positions’ values on x is fixed over each interval. That is, for each I = [Ai(/), A 2 (/)] G X, there is a 
uniquely defined partition S~il) U 5'° (I) U >S'+(/) = V, such that 

• = A 2 (n) for all v G S~il), where A 2 (u) < Ai(/); 

• h-Q(x)(^) = ^ ^ 

• (f) = Ai(n) for all v G S~'~il), where Ai(r!) > A 2 (/). 

Indeed, 5'°(/) is defined as the set S such that I = Gives ^i'^)'^ Figure |4]for an illustration. 

It is not difficult to see that for each position v, both Ai(u) and A 2 (r’) are rational numbers with 
integer denominator not exceeding n and integer numerator not exceeding nR. 

Lemma 11 Given a parametrized BW-game Gix) and a position v, we can find Ai(u) and X 2 iv) in 
time 0(n®i?log(ni?)). 

Proof The smallest possible value of yg(^^^iv) is obtained for x = —R and the largest for x = R. 
After this we can do two binary searches to locate Ai and A2. Since in each step of the binary search 
we solve a BW-game with integer coefficients except for x which is rational with denominator at 
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Figure 4: In this figure we illustrate Corollary [TJ The two piecewise linear functions represent the 
value function of G(x) for initial positions u and v, respectively, as functions of x. For the interval 
/ = [Ai(u), A 2 (u)] we have {it,u} C S°{I), while for /' = [Ai(u), Ai(m)] we have u G S~{I') and 

V G s°{r). 


most n, the required precision is only Consequently, we have to solve 0{\og{nR)) BW-games 
and hence the claimed complexity bound follows by [?]. □ 

We can now apply the above lemma for each position v G V and obtain a set of intervals 
{I{v) \vGV}. From these we can obtain the set of intervals I{Q{x)). We shall call this procedure 
BW-FindIntervals(^(a:)) which finds this set of intervals I{Q(x)) together with the partitions S~ (/)U 
S°il) U S+{I) = V, for I G I(G(x)). 

In the algorithm, we will use the above results assuming that all random positions have the same 
value X, which we do not know exactly. By the above structure we will guess the interval I containing 
X. Suppose that our guess is correct. It follows that the sets S~^{I) and £'“(/) contain no random 
positions and hence they provide a set of deterministic ergodic classes for which we obtain the values 
and optimal strategies by solving BW-games. On the other hand, the set S^{T) contains random 
positions for which the BW-game C/[S'°(/)](a;) has not enough information to obtain the optimal 
strategies. For this, we consider the parametrized BWR-game C/[S'°(/)](a;), and find the interval of 
x-values for which this game is ergodic. By Lemmas [5] and 111 the optimal strategies obtained for 
such ergodic BWR-game will yield the optimal strategies for the corresponding set in Q. 

Lemma 12 For a parametrized BWR-game Q{x), the set I{Q(x)) := {x G [—i?, i?] | G(x) is ergodic} 
forms a closed (possibly empty) interval in [—i?, i?]. 

Proof Suppose that G{x) is non-ergodic. Then either 

(i) there is a strategy G Sb such that for all sw G Sw, there is position v belonging to an 

absorbing class in the Markov chain Q{sw,s*g){x) with value -^{x) < a:, or 

(ii) there is a strategy G Sw such that for all sb G Sb, there is position v belonging to an 

absorbing class Cgg in the Markov chain with value pg^^, bb){x) ^ 

In case (i), since does not include ui, the same strategy sL guarantees that pg, , Pv) < 
for any y > x. In case (ii), the strategy s}^ guarantees that pg^^, bs)(v) ^ V < x. Thus 

we conclude that the game Q{y) remains non-ergodic for either aX\ y < x or all y > x. 
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Suppose that there exist ri < T 2 such that G{ti) and Q{t 2 ) are ergodic. Then the claim above 
implies that G{t) is ergodic for all t £ [ti, r2]. □ 


Lemma 13 In a parametrized BWR-game G{x), let ti < T 2 be two real numbers in I{Q[x)), and 
£ Sw and Sg € Sb be optimal White and Black strategies in the games t/(T2) and Giri), 
respectively. Then (s^,Sg) is a pair of optimal strategies in G{x) for all x £ [ti,T 2 ]. 

Proof By the definition of G(x) the position w with the self-loop forms a single-vertex absorbing 
class with value x. Thus, for all x G [ri, T 2 ], all other positions have the same value x, since Gix) is an 
ergodic game. We claim that for all u £ fo and sw £ Sw we have < x. To see 

this, consider a strategy s = (sw, s*g) for some sw £ Sw- In the Markov chain Gs{ti), any position 
V in an absorbing class, that is not formed by the singleton {w}, has value Pg^{Ti)iv) < ti < x, and 
hence it has value /ig 3 (a;){u) < ri < x in Gsix). It follows that the value of any transient position v 
in Gs(x) is also at most x since it is a convex combination of the values in the absorbing classes. 

Analogously, we obtain that for all u £ fo and sb £ Sb'- ^ 

To be able to compute efficiently I{G{x)) and the corresponding optimal strategies mentioned 
in the previous lemma, we need a procedure to find the top class of a given BWR-game. This is 
provided in the next lemma. 

Lemma 14 Let G ■= G{^) be a BWR-game obtained from the parametrized BWR-game G{x), where 
p, q are integers such thatp < y/k2^/‘^D^R and q < . Then, we can determine both the bot¬ 

tom and top ergodic classes, and find a pair of strategies proving this in time poly{n){2D)'^^'^^ Rlog R, 
using algorithm PUMP{G,£) with e as in (0) . 

Proof We first show how to find the top ergodic class (that is, the set of positions which have the 
highest value in G)- We apply Algorithm|2called BWR-FindTop(^). It works by calling PUMP(^, e) 
with e as in ([2]). If no contra-ergodic partition is produced then the game is ergodic and its solution 
has been already found by the selection of e, in view of Theorem 21 Otherwise, any position v £ V~ 
and hence in c\'g{V~) has value less than the top value, which can be seen from the value equations 
(Cl), and using induction as positions are added to cl 3 (P“). Thus if we remove all these positions 
we get a well-defined game G' ■= \ cls(R“)] which includes the top class of G- By induction, 

BWR-FindTop(t/') returns the top class T CY := P\cl^(R“) in G' , and we claim that T is the top 
class also in G- To see this, note that by PropositionUl there are no black or random arcs from T to 
Y\T, and by the definition of Y, there are no black or random arcs from T to V\Y either. Thus, if 
s* = s*[T] is the situation returned by BWR-FindTop(t/'), then s’^ guarantees for White in G the 
same value y guaranteed by in G' ■ Furthermore, since T is the top class in G' , there is a Black 
strategy sb = sb[T] such that < y for all White strategies sw = By conditions 

(ii) and (iii) of the contra-ergodic partition, there is strategy Sb = sb[R~] that forces White to 
stay in V~ and ensures that fig _ (v) < y for all v £ V~ and sw £ Sw- It follows that the 

1,35) 

strategy sb £ Sb obtained by concatenating s_b[F], s_b(c1b(R“)) and .§b[R“] satisfies fJ-g^iv) < y 
for all x £ R and sw £ Sw- This proves our claim. 

Finding the bottom ergodic class is analogous and can be done by a similar procedure BWR- 
FindBottom(^). □ 


Lemma 15 Let us consider a parametrized BWR-game G{x) and let I{G(x)) = [ri,T 2 ]. We can 
compute Ti, T 2 , and the optimal strategies described in Lemma\T^in time poly(n)(2Il)‘^*^*^)i?log^ i?. 

Proof We employ binary search, calling in each step the procedure BWR-FindTop(C/(x)), defined 
in the previous lemma, with different guessed values of the parameter x. Suppose that we start the 
search on the interval [Ai,A 2 ]. If the top class does not include all the positions then the game is 
non-ergodic, and the procedure will return a position u £ R, and either a strategy sb G Sb certifying 
that tig(x){v) < X or a strategy sw G Sw certifying that Mg(x)(v) > x. In the former case, we reduce 
the search into the interval [Ai,x], and in the latter case, we reduce the search into the interval 
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Algorithm 2 BWR-FindTop(C/) 

Input: A BWR-game G = {G = {V, E),p, r) 

Output: A set T CV such that for all v G V, p.g{v) = max„gy^g(u) and a situation s* solving 
the game G[T] 

1 : Set e as in ([2]) 

2 : if PUMP(C/,£) returns a contra-ergodic partiton then 

3: FindTop(5[P\cl'B(p-)]) 

4: else 

5: Let s* be the situation returned by PUMP(C/,e) 

6 ; return (V, s*) 

7; end if 


[a;, A 2 ]. Since by LemmalHl the precision needed is we need 0{log R + klogD) many 

search steps. □ 

We call the procedure described in the above proof BWR-FindErgodicityInterval(C/(a;)). 

4.2 Description of main algorithm 

For a BWR-game Q and a parameter a; S R, define G{x) to be the BW-game obtained from Q by 
replacing each random position v G Vr in G with a terminal deterministic position (black or white, 
arbitrarily) with a local reward of value x on the self-loop {v,v). 

Our algorithm uses four auxiliary routines: BWR-FindErgodicityInterval(C/(a;)) and find BW- 
FindIntervals(C/(a:)) which we described above; BWR-SolveErgodic(C/) which solves a given ergodic 
BWR-game G using the pumping algorithm as in Theorem S] and BW-Solve(C/) which solves a 
BW-game using e.g. |OKK881 iPlsMl IZPM] . 

For a position v G Vr, we define rank(u) = \{pg{u) \ u G Vr, pg{u) > ng{v)}\ + 1. For each 
V G Vr, our algorithm guesses its rank as g{v). We remark that there are at most possible guesses. 

For each such guess g : Vr ^ [k], we call procedure BWR-Solve(C/, U, g, I, s*[P \ U]) with U = V 
and i = 1. At any point in time, U represents the set of positions for which strategies have not been 
fixed yet. 

This procedure keeps constructing complete situations for G until it finds an optimal one or 
discovers that our current guess is not correct. Each time we check optimality by solving two MDPs 
using linear programming (see, e.g., [MOTOj 'l. We will prove that for each guess g the procedure will 
only construct 0{v{G)^) complete situations. We will also prove that it always finds an optimal one 
if our guess is correct. 

We now describe this procedure BWR-Solve(-). For an integer t G [k], define to be the set of 
positions obtained from G by removing all positions in the black closure clsdu G Vr \ g{v) > £}); 
these are positions for which the values are smaller than the value of the random positions at rank 
£, assuming that our guess is correct. We first form the game G[U^]{x). Then we find the set of 
intervals X{G[U^]{x)) using the routine BW-FindIntervals(C/[17^](a;)) described above. Then for each 
such interval I = [Ai(/), A 2 (/)], we consider three subgames, defined by the sets <S'+(/), <S'°(/), and 
S~{T). By the definition of S'^{I), the first subgame C/[S'+(/)] is a BW-game. Hence, the optimal 
strategy s*[«S'+(J)] in C/[S'+(/)] can be obtained by calling BW-Solve(C/[5'+(/)]). The positions in the 
second subgame C/[S'°(/)] have the same value x. Although we do not know what the exact value of 
X is, we can find the interval of ergodicity of the BWR-game ^/[^^(/(((a;) by calling procedure BWR- 
FindErgodicityInterval(C/[S'°(/)](a;)) fstep fTTIl . Once we determine this interval [ti,T 2 ], we solve 
the two ergodic games ^/[-S'°(/)](ri) and t/[S'°(/)](r 2 ) using procedure BWR-IsErgodic(-), and then 
combine the strategies according to Lemma fT^ to obtain an optimal situation for C/[i5'°(/)]. Finally, 
the rest of the game is solved by calling the procedure recursively with G '■= G[U \ (5'+(/) U S'°(/))] 
and £:=£+!. 

The following lemma states that if the guess is correct, then the procedure actually solves the 
game. 
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Algorithm 3 BWR-Solve(C/, C/, g, £, s*[y \ U]) 

Input: A BWR-game Q = (G = {V = {Vb U Vw U Vr), E),p,r), a set of positions U CV, a vector 
of rank guesses g : Vr ^ [fc], an integer £, and a situation s* [V \ U\ on the set of positions V\U 
1; if Vr = 0 then 
2 : s*[C/] :=BW-Solve(C;[C/]) 

3: if s* is optimal in Q then 

4: output s* and halt 

5: end if 

6 : else 

7: ■.= U\c\B{{v&VR\g{v)>£}) 

8 : (I, S'-, 5°, S+) :=BW-Findlntervals(ejl?^(x)) 

9: for each I = A 2 (/)] G I do 

10 : s*[S+(/)] :=BW-Solve(a[S+(/)]) 

11: [''■iii' 2 ] :=BWR-FindErgodicityInterval(C/[S°(/)](a:)) 

12 : si :=BWR-SolveErgodic(e;[SO(/)](ri)) 

13: s2 :=BWR-SolveErgodic(e;[S°(/)](T 2 )) 

14: s*[S°(/)] := (s^,s)j) 

15: if t/= (S+(/)US°(/)) then 

16: if s* is optimal in Q then 

17: output s* and halt 

18: end if 

19: else 

20 : BWR-Solve(a, U \ (S+(/) U S°{I)),g,£ + 1, s*[{V \U)U S+(/) U S°(/)]) 

21 : end if 

22 : end for 

23: end if 


Lemma 16 Let Q be a BWR-game. If procedure BWR-Solve{G,U, g,£, s*[V \ U]) is called with 
g{v) = rank(u) for all v G Vr, U = V and £ = 1, then it returns an optimal situation s*. 

Proof Suppose that g is correct, i.e., g{v) = rank('(;) for all v G Vr. Let d be a nonnegative integer 
such that d < max^gy^j rank(u). We prove by induction on d that there is a path in the recursion 
tree (from the root to a node at depth d — 1) through which the algorithm finds correctly all the 
classes S{6i) := {u £ V \ pg{u) = 6i}, for 6i > gg{v) for all v £ Vr with rank(u) = d, and that the 
obtained situation s* induces optimal situations in G[0i\ for each such i. This claim together with 
Lemma [9] would prove the theorem. 

For d = 0 there is nothing to prove. Suppose that the claim is correct up to d = d — 1 and 
consider the path form the root of the recursion tree to a node Af verifying this. Consider the call 
to procedure BWR-Solve(-) at this node. Assume that p.g{v) = x for all v £ Vr with rank(u) = h. 
Let I be the set of intervals computed in step [3 and /Gibe the interval for which x £ I\ this 
interval will be eventually chosen in step El Since all the positions in c1_b({u G Vr \ g{v) > h}) 
have value less than a:. Lemma [3-(ii) implies that the values computed for all positions in the set 
S'+(/) = {v £ \ gg{v) > x} are correct. 

Furthermore, by definition of S^{I) the subgame C/[5'°(/)] is ergodic with value x, and hence 
X belongs to the interval of ergodicity [ti^T 2 ] computed in step [TT] By Lemma fT3l the situation 
s*[S'°(/)] computed in step [14] is optimal in the subgame t/[S'°(/)]. 

This completes the proof of the induction step. □ 

Proof of Theorem [H The fact the the algorithm returns an optimal situation follows from the 
previous lemma. For the complexity analysis, note that the depth of the recursion tree is at most 
/ + 1. The number of intervals tried at each recursion level is at most 2v(G) + 1. The running time 
of the local work at each edge of the recursion tree is limited by poly(n) (2/1)’^*^^) i?log^ R . We have 
at most (2r{Q) + 1 )^+^ such edges, and hence the claimed complexity follows since v{G) < n. □ 

Finally, we remark that our proof of Theorem[T]actually gives the stronger bound of (r Dk)0^^lR- 
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poly(n, log i?) for the running time of the algorithm. The definition of v{G) implies the following 
observation. 

Observation 2 For a stochastic terminal payojf game with t terminals, v{Q) < t + 1. 

Thus, we obtain the following result. 

Corollary 2 Algorithm 0 solves any stochastic terminal payojf game with t terminals in time 
{tDk)^^^^ R ■ poly(n, logi?), and any simple stochastic game in time poly(n). 
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